System and method for managing virtual hard disks in cloud environments

ABSTRACT

A system, method, and computer-readable storage medium for managing virtual hard disks in a cloud computing/storage environment are provided. The method includes associating, using a virtual hard disk (VHD) management system of a server device, a plurality of data blocks of a virtual hard disk stored at a cloud vendor to a corresponding plurality of cloud objects. A plurality of cloud object identifiers associated with the plurality of cloud objects in a first cloud allocation table are stored. Changes to one or more data blocks are determined. Corresponding new cloud allocation tables for every data block in the plurality of data blocks that has changed are formed, the new cloud allocation tables having corresponding new cloud object identifiers. The first and the new cloud allocation tables are downloaded and merged to form an updated cloud allocation table. The updated cloud allocation table is uploaded to the cloud vendor.

FIELD

The invention relates generally to the field of cloud computing/storagesystems and more particularly to efficiently and cost effectivelymanaging virtual hard disks in cloud environments.

BACKGROUND

Cloud computing/storage environments have turned around the manner inwhich business organizations examine the requirements and capacity toimplement their data processing needs. A cloud computing/storageenvironment includes capabilities where the cloud provider hosts thehardware and related items and provides systems and computational poweras a service to a customer (e.g., a business organization). Whenimplementing data processing needs via a cloud vendor, a customer doesnot need to bear the cost of space, energy, and maintenance in order toacquire the required computational resources at a reasonable cost, andcan back up data to a cloud vendor's storage facility or device.

Cloud computing/storage environments support virtual machines (VM) thatmay be defined as emulation of physical machines in software, hardware,or combination of both. A set of services or resources may form avirtual machine image that has associated recovery points or snapshots.A recovery point or snapshot of a virtual machine (VM) is a point intime copy of the virtual machine. In a typical scenario, recovery pointsor snapshots of a virtual machine can be copied and stored in a cloudcomputing and storage environment. Recovery points are created atregular intervals and data is stored at the recovery points containingone or more virtual hard disks (VHDs) used as hard disks for the virtualmachine and stored as files in the cloud computing/storage environment.Conventionally, to merge or consolidate these recovery points orsnapshots, for example, when two or more different recovery points areto be merged, virtual VHDs of the virtual machine associated with therecovery points, are downloaded from the cloud environment and thenmerged locally. The merged VHDs are then again uploaded into the cloudenvironment. Unfortunately, such downloading and uploading of snapshotsin the form of VHDs is expensive and time consuming. These and otherdrawbacks exist.

SUMMARY

In some implementations, these and other drawbacks of existing systemsare addressed, where provided is a system, method, and computer-readablestorage medium having one or more computer-readable instructions thereonfor managing virtual hard disks in a cloud computing/storageenvironment. The method includes associating, using a virtual hard disk(VHD) management system of a server device, a plurality of data blocksof a virtual hard disk stored at a cloud vendor to a correspondingplurality of cloud objects. A plurality of cloud object identifiersassociated with the plurality of cloud objects in a first cloudallocation table are stored. Changes to one or more data blocks in theplurality of data blocks are determined. Corresponding new cloudallocation tables for every data block in the plurality of data blocksthat has changed are formed, the new cloud allocation tables havingcorresponding new cloud object identifiers. The first and the new cloudallocation tables are downloaded. The first and the new cloud allocationtables are merged to form an updated cloud allocation table. The updatedcloud allocation table is uploaded to the cloud vendor such that theupdated cloud allocation table includes information regarding thechanged data blocks in the plurality of data blocks.

Various other objects, features, and advantages of the invention will beapparent through the detailed description and the drawings attachedhereto. It is also to be understood that both the foregoing generaldescription and the following detailed description are exemplary and notrestrictive of the scope of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of an example environment for managing virtualhard disks in a cloud computing/storage environment, according tovarious implementations of the invention.

FIG. 2 is an illustration of a conventional format for a dynamic virtualhard disk.

FIG. 3 is an illustration of a dynamic VHD used in a modified format ofa dynamic virtual hard disk, according to various implementations of theinvention.

FIG. 4 is an illustration of a format for a header field for a recoverypoint, according to various implementations of the invention.

FIG. 5 is an illustration of a disk information field of the dynamicVHD, according to various implementations of the invention.

FIG. 6 is an illustration of a structure of cloud allocation table,according to various implementations of the invention.

FIG. 7 illustrates a flowchart of a process for merging cloud allocationtables, according to various implementations of the invention.

FIG. 8 illustrates a flowchart of a process for creating a cloudallocation table at a new or latest recovery point or snapshot,according to various implementations of the invention.

FIG. 9 illustrates an exemplary scenario of when a merge happens,according to various implementations of the invention.

FIGS. 10A and 10B illustrate examples of cloud allocation tables of basevirtual hard disk and child virtual hard disk, according to variousimplementations of the invention.

FIG. 10C illustrates an example of a cloud allocation table of aconsolidated base disk after a merge operation, according to variousimplementations of the invention.

DETAILED DESCRIPTION OF THE INVENTION

It is to be noted that the following definitions are included solely forillustration purposes, and these definitions are indicative with respectto the implementations described herein and are not meant to beexhaustive or restrictive in nature.

In some implementations, a virtual machine (VM) is defined as anemulation or implementation of an actual machine, e.g., a computer. Insome implementations, the VM may be software that simulates the computeror any other machine.

In some implementations, a VM has an image that is a snapshot of aresource or service provided in a cloud computing/storage environmentand managed. A virtual machine image may include one or more VHDs.

In some implementations of this invention, a VHD is defined as a fileformat that may contain what is found on a physical hard disk drive,such as disk partitions and a file system, which in turn can containfiles and folders. A VHD is typically used as the hard disk of a virtualmachine.

In some implementations, a cloud object defines a basic unit of storagein a cloud computing/storage environment.

In some implementations, a recovery point or a snapshot is defined aspoint-in-time copy of the VM that may include a point-in-time state ofthe VM.

In some implementations, an image is defined as a Virtual Machine imagethat is a collection of resources or services available for use by acustomer in a cloud computing/storage environment.

In some implementations, a block allocation table is defined as a tableof absolute sector offsets into a file backing a hard disk of a computersystem.

In some implementations, a data block is defined as a sequence of bytesor bits.

In some implementations, a cloud object identifier is defined as avariable or field that identifies a cloud object. For example, a blobname for AZURE® provided by Microsoft Corporation of Redmond, Wash. andobject name for S3® provided by Amazon.com, Inc. of Seattle, Wash. arecloud object identifiers.

In some implementations, at a given time, a dynamic disk may be as largeas the actual data written to it plus the size of the header and footer.Allocation of data may be done in blocks such that as more data iswritten, a file associated with the dynamic disk dynamically increasesin size by allocating more blocks. In the context of a VHD, at a giventime, a dynamic disk may be as large as the actual data written to itplus the size of the header and footer. Allocation of data may be donein blocks such that as more data is written, a file associated with thedynamic disk dynamically increases in size by allocating more blocks.

In some implementations, a differencing disk is defined as a virtualhard disk used to isolate changes to the VHD or a guest operating systemby storing them in a separate file.

In some implementations, a sector map for dynamic disks is defined as abitmap that indicates which sectors contain valid data (indicated bybinary 1's) and which sectors have never been modified (indicated bybinary 0's). For differencing disks, the sector bitmap indicates whichsectors are located within the differencing disk (indicated by binary1's) and which sectors are in the parent (indicated by binary 0's).

FIG. 1 is an exemplary illustration of an environment 100, which is anexample of an environment having a system for managing virtual harddisks in a cloud computing/storage environment. In some implementations,environment 100 includes, among other things, a server device 104(interchangeably referred to herein as server 104), one or more clientdevices 110, one or more cloud storage service providers C1-Cn(interchangeably referred to herein as cloud vendors C1-Cn), and anetwork 108 for communication between various components of environment100 via wired, wireless, optical, or other types of communication links,known to one of ordinary skill in the art.

In some implementations, server device 104 may include a hardwarecomputing device having an operating system, disk drives,interfaces/ports, memory, buses, cooling sub-systems, and varioussoftware stored therein on tangible computer readable media.Specifically, in some implementations, server device 104 may include avirtual hard disk (VHD) management system 106, as described below, formanaging virtual hard disks stored at various cloud vendors C1-Cn in acloud computing/storage environment, such as environment 100. In someimplementations, server device 104, although shown separate from cloudvendors C1-Cn in environment 100, may be a part of one of cloud vendorsC1-Cn. In some implementations, server device 104 may be a serversupporting a plurality of jobs/applications for one or more clientdevices 110. In some implementations, server device 104 includeselectronic and electrical circuitry such as processors and memory and/orother hardware operable to execute computer-readable instructions using,for example, an operating system (OS). In some implementations, serverdevice 104 may include a security device that monitors various securityaspects for an organization in environment 100. In some implementations,server device 104 may include one or more tangible computer-readablestorage media configured to store one or more software modules, whereinthe software modules include computer-readable instructions that whenexecuted by one or more processors in server device 104 may cause theprocessors to perform the functions related to managing virtual harddisks in a cloud computing/storage environment, as described herein. Insome implementations, server device 104 may comprise computer hardwareprogrammed with a computer application having one or more softwaremodules that enable the various features and functions related tomanaging virtual hard disks in a cloud computing/storage environment(e.g., environment 100), as described herein. It will be appreciatedthat in some implementations server device 104 may be located remotefrom a physical location of the organization (e.g., on a home computerof a user within the organization's network), and variousimplementations of the present invention are not limited by the locationof server device 104. Further, although one server device 104 is shown,in some implementations, cloud vendors C1-Cn and/or client devices 110may communicate in parallel or in series with a plurality of differenttypes of server devices including but not limited to mobile and desktopclient computing/storage devices.

In some implementations, VHD management system 106 may be implemented,for example, using one or more programming languages such as C, Java, orother programming languages known to one of ordinary skill in the art.In some implementations, VHD management system 106 forms a system withelectronic files stored in one or more memory devices of server 104 tomanage snapshots of images executing at server 104. In someimplementations, VHD management system 106 includes code or instructionsstored on computer readable medium or computer readable storage device,which when executed by a processor cause the processor to implementvarious features and functionalities including managing, storing,retrieving, and merging VHDs in cloud storage devices provided, e.g., bycloud vendors C1-Cn. In some implementations, VHD management system 106is part of one or more memory devices in server device 104. In someimplementations, VHD management system 106 is a hardware moduleimplemented in server device 104 as an Application Specific IntegratedChip (ASIC) with various logic circuitry integrated thereupon toimplement the functionalities of VHD management system 106 discussed inFIGS. 3-10. In some implementations, VHD management system 106 isimplemented using a Field Programmable Gate Array (FPGA) device. It willbe appreciated that implementations of VHD management system 106 may becarried out using a combination of hardware and software, as can becontemplated by one of ordinary skill in the art in view of thisdisclosure. In some implementations, VHD management system 106 includesone or more cloud allocation tables (CATs) 102 (also referred to as CATtables 102), discussed with respect to FIGS. 3-10.

In some implementations, client devices 110 interact, directly orindirectly through server device 104, with a plurality of cloud storageservice providers C1-Cn via wired, wireless, optical, or other types ofcommunication links over network 108 known to one of ordinary skill inthe art. Client devices 110 are computing devices known to those ofordinary skill in the art (e.g., mobile or desktop computing devices).In some implementations, one or more client devices 110 may accessresources provided by cloud vendors C1-Cn through server device 104.

In some implementations, network 108 may be the Internet or the WorldWide Web (“www”). In some implementations, network 108 may be aswitching fabric that is part of a Wide Area Network (WAN), a Local AreaNetwork (LAN), or other types of networks known to those of ordinaryskill in the art (e.g., a TCP/IP network). In some implementations,network 108 routes requests from server 104 and/or client devices 110for accessing various resources.

In some implementations, a plurality of cloud vendors C1-Cn may includestorage devices and hardware that may be part of or separate from one ormore servers (e.g., servers S1-S6 in cloud vendor C1, servers S7-S10 incloud vendor C2, and servers S12-S17 in cloud vendor Cn). The storagedevices and hardware may store data on respective memory devicestherein. The servers may be accessed by server device 104 for providingapplications/services to customers at client devices 110, although otherservers or devices may access servers S1-S17 for other purposes.Further, any number of servers communicably connected in known ways maybe used as appropriate for cloud vendors C1-Cn and the number and typesof connections shown for the servers S1-S17 in FIG. 1 is by way ofexample only and not by way of limitation. An example of cloud vendorsC1-Cn includes cloud computing/storage services provided by Amazon.com,Inc. of Seattle, Wash., although other vendors may be used.

FIG. 2 illustrates a basic format of a dynamic virtual hard disk storedin the cloud storage environment, for example, in environment 100 at oneof cloud vendors C1-Cn. In some implementations, for example in dynamicdisk images, the VHD file format is represented as a file 200 thatincludes one or more dynamic disk header fields 202 including a copy ofhard disk footer 204, a dynamic disk header 206, a block allocationtable (BAT) 208, and one or more data blocks 210. For example, somevirtual hard disk formats are supported by Microsoft Virtual PC® andVirtual Server® provided by Microsoft Corporation of Redmond, Wash.include fixed hard disk image, dynamic hard disk image, and differencinghard disk image formats, although other formats provided by othervendors may be used, as will be apparent to one of ordinary skill in theart. It is to be noted that sizes of various fields of file 200 areexemplary in nature, and are not intended to be limiting, as will beappreciated by one of ordinary skill in the art. In someimplementations, virtual hard disks of respective virtual machines arenot stored as a single cloud unit storage object.

Conventionally, BAT 208 is a table of absolute sector offsets into file200 backing a virtual hard disk. BAT 208 is pointed to by a “TableOffset” field (not shown) of dynamic disk header 206. The size of BAT208 is calculated during creation of the virtual hard disk. The numberof entries in BAT 208 is the number of data blocks 210 needed to storethe contents of the virtual hard disk when fully expanded. For example,in some implementations, a 2-GB disk image that uses 2 MB blocksrequires 1024 BAT entries, where each entry is four bytes long. In someimplementations, unused table entries are initialized to a physicaladdress 0xFFFFFFFF in a memory of server 104 or other storage devices.In some implementations, BAT 208 is extended to a sector boundary andhas a field (not shown) within dynamic disk header 206 indicates howmany entries are valid. Each entry in BAT 208 refers to one or more datablocks in data blocks 210 in the virtual hard disk image. In someimplementations, one or more data blocks in data blocks 210 may becontiguously stored with pointers to such contiguous memory locationsstored in BAT 208. Since details of the format of file 200 are known toone of ordinary skill in the art, they will not be described in detail.

In conventional systems, for changes to virtual hard disks includechanges to one or more data blocks in data blocks 210. Every time a datablock in data blocks 210 is changed or modified, an additional VHD (or a“differencing disk”) having the updated data block is created. Timetaken for consolidating or merging the changes to data blocks 210 isdetermined as a total of time taken to download one or more data blocks210 in different VHDs, time taken to modify one or more data blocks 210based upon the changes, and time taken to upload one or more data blocks210 after the changes have been performed and merging of redundant datablocks is completed. However, such downloading and uploading from alocal space, e.g., on client devices 110 or server device 104, to cloudvendors C1-Cn is expensive and time consuming in direct proportion withthe number of one or more data blocks 210 and respective size of one ormore data blocks 210. Generally, the term downloading, in someimplementations, refers to data received from cloud vendors C1-Cn atserver device 104 and/or client devices 110 either as a response to arequest from server device 104 and/or client devices 110, or otherwise.Likewise, the term uploading, in some implementations, refers to sendingdata (e.g., merged CAT tables) from server device 104 and/or clientdevices 110 to one or more of cloud vendors C1-Cn, e.g., after mergingthe CAT tables.

Instead, in some implementations, every block in data blocks 210 of theVHD is stored in a separate cloud unit storage object (e.g., an S3®object), for example in a distributed fashion. As noted above, eachcloud object is a file that now stores data in data blocks 210. Thedistributed data blocks are then addressed to using one or more cloudallocation tables 102 (or, CAT tables 102) instead of BAT 208. Asdiscussed below, operations are then performed on CAT tables 102, whichare substantially smaller than data blocks 210 themselves, whicheliminate the need for expensive downloading and overwriting of blocksin data blocks 210, as discussed below in FIG. 3.

FIG. 3 is an illustration of a format of file 200 modified to includeone or more cloud allocation tables 102 to form an electronic file 300to represent a dynamic virtual hard disk. In some implementations, a CATtable may be a data structure, stored in a storage device or memory thatincludes one or more identifiers or pointers to data blocks stored asdistributed cloud objects. Such representation of a dynamic VHD thatincludes CAT tables 102 may be stored at server 104, for example, or atany other storage device such as those provided by cloud vendors C1-Cn.In some implementations, electronic file 300 includes dynamic diskheader fields 202, copy of hard disk footer 204, dynamic hard diskheader 206, and one or more data blocks 210 each stored in a uniquecloud object. However, in some implementations, conventional BAT 202 offile 200 is replaced by one or more cloud allocation tables 102 in file300, referred to hereinafter in singular form as cloud allocation table102). In some implementations, cloud allocation table (CAT) 102 itselfis stored as one or more cloud objects depending upon a size of CATtable 102.

FIG. 4 is an illustration of a format of a header 400 for a recoverypoints, according to various implementations of the invention. In someimplementations, header 400 includes a validation data structure 402,one or more recovery points 404, a next object identifier field 406, adisk information field 408 having one or more pointers to one or moredisk information identifiers 410. In one implementation, for everyrecovery point on cloud vendors C1-Cn, there exists a corresponding file300 and therefore, a unique header 400.

In some implementations, validation structure 402 includes datastructures for calculating a checksum of header 400, a modification timefield, and other reserved fields. In some implementations, one or morerecovery points 404 each include at least one VHD image. The number ofrecovery points is calculated as a sum of a number ‘n’ of child disks ofthe VHD and the base disk (i.e., a total of n+1 recovery points). Insome implementations, next object identifier field 406 is used in ascenario when header 400 spans across multiple objects, in whichscenario next object identifier field 406 points to next objectidentifier in CAT 102. In some implementations, disk information field408 includes pointers 408(1)-408(n) that point to corresponding to diskinformation fields 410, as indicated by arrows in FIG. 4. Diskinformation field 408 is described in more detail in FIG. 5. In someimplementations, disk information identifiers include informationrelated to most recent child disks in reverse chronological order,although other arrangements may be possible.

FIG. 5 illustrates disk information field 408 in more detail. In someimplementations, disk information field 408 includes a header footerobject identifier 502, an object identifiers number field 504, andobject identifiers 504(1)-504(n). Header footer object identifier 502includes an object identifier corresponding to a VHD's header andfooter. Object identifiers 504(1)-504(n) facilitate storage of CAT table102. VHD on host file system is stored as one file. However, VHD atcloud vendors C1-Cn is stored in multiple cloud objects that are unitsof data in cloud storage environment. For example, object can be anAmazon Object® provided by Amazon.com, Inc. of Seattle, Wash., or a blobin Azure® provided by Microsoft Corporation of Redmond, Wash. For eachVHD in the parent chain VHD Headers (Header and Footer) are stored inone cloud object. Each data block in VHD is saved as a cloud object andthe cloud object identifiers 504(1)-504(n) is stored in CAT 102 at acorresponding entry.

FIG. 6 is an illustration of an exemplary structure of cloud allocationtable 102. In some implementations, a structure of CAT 102 contains asmany entries as there are entries in conventional VHD BAT 208 shown inFIG. 2. The structure shown in FIG. 6 indicates a one to one mappingbetween cloud object identifiers 610 and one or more data blocks 210.For example, a first data block (“Data Block #1”) in one or more datablocks 210 corresponds to a first cloud object identifier(“VMname_UUID_DISKID_1”) among cloud object identifiers 610. If there isno entry in BAT 208 in file 200 for a given block number among one ormore data blocks 210 in file 200 of FIG. 2, then a corresponding cloudobject identifier in cloud object identifiers 610 is not created, asindicated, for example, by a “NULL” indicator for “Data Block #3” inFIG. 6. In some implementations, cloud object identifiers 610 each havecorresponding Boolean indicators 612 that show whether or not acorresponding data block in data blocks 210 was changed in a particularrecovery point or snapshot of the VM image. In some implementations, asector map 614 indicates sectors of VHD corresponding to each of cloudidentifier objects 610 where such changes occurred. In someimplementations, sector map 614 may be optional and not mandatory. Whendata of all sectors in a specific block in data blocks 210 is present,sector map 614 may be indicated by all binary 1's when a VHD isrestored.

FIGS. 7-9 are flowcharts of methods 700-900, respectively, depictingoperations performed by one or more components of environment 100. Thedescribed operations of processes 700-900 may be accomplished using oneor more of modules/sub-modules described herein and in someimplementations, various operations may be performed in differentsequences. In some implementations, additional operations may beperformed along with some or all of the operations shown in FIGS. 7-9.In some implementations, one or more operations may be performedsimultaneously. In some implementations, one or more of operations maynot be performed. Accordingly, the operations described are exemplary innature and, as such, should not be viewed as limiting. In someimplementations, processes 700-900 are performed using instructionsstored on tangible computer readable media (e.g., memory devices inserver 104), which instructions when executed by one or more processorsin server 104, or elsewhere, cause the processor to carry out theoperations of processes 700-900.

FIG. 7 illustrates a process 700 for merging recovery points (RPs). Inan operation 702, a user requests that a merge of RPs 3 and 2 among RPs710 of a cloud vendor Ck (where k is an integer) be performed. RPs arepoint in time snapshots of images provided by cloud vendors C1-Cn andstore one or more VHDs. In an operation 704, VHD management system 106downloads CAT tables 710 a and 710 b corresponding to VHDs of RPs 3 and2, respectively. It is to be noted that in RPs 710, although logicalrepresentations of RPs/VHDs show one VHD per recovery point in RPs 710,such representation is by way of example only and not by way of alimitation. For example, one or more of RPs 710 can each include two ormore VHDs. Further, RP1 is denoted as an RP corresponding to a base VHDand RPs 2-5 are subsequently created RPs of incremental child VHDsstoring point in time snapshots of images used by a customer (e.g., oneor more client devices 110), although such representation is solelyillustrative and is not meant to be limiting. In an operation, 706,merging of CAT tables of RPs 3 and 2 is carried out, and a new set ofRPs 712 have a new CAT table 712 a is formed for merged RPs 3,2, asshown in RPs 712, as discussed below in FIGS. 10A-10C by way of exampleonly.

FIG. 8 illustrates a process 800 for a scenario where five RPs arestored in cloud vendors C1-Cn and a sixth RP is to be created, althoughsuch numbers of RPs are presented by way of example only and not by wayof limitation. RPs in FIG. 8 are represented logically by VHDs 810. Forexample, VHDs 810 may be part of RPs 710. In an operation 802, to createa new CAT table, CAT table of a VHD-5 in VHDs 810 of a virtual machine808 is retrieved and corresponding Boolean indicators 612 are marked tological “FALSE” or binary “0” value. A template corresponding to new CATTable 6 is created using the template of CAT table of VHD-5, asdiscussed below with respect to FIGS. 10A-10C. In an operation, 804, forevery block changed in VHD-6, a cloud object is created in the new CATtable 6's template. Accordingly, CAT Table 6 for VHD-6 in VHDs 810 isoverwritten by VHD management system 106. It is to be noted that sinceCAT table 6 has same template of CAT table 5, other values in thetemplate of CAT table 6 remain unchanged. In an operation 806, VHDmanagement system 106 uploads the new CAT table 6 of VHD-6 to cloudvendors C1-Cn. As a result, only CAT table 6, and not the actual datablocks 210 are downloaded, merged and uploaded.

FIG. 9 illustrates a process 900 for determining when a merging of RPscorresponding to base/parent VHD and child VHD may happen. In anoperation 902, if a user at client devices 110 explicitly selects twoRPs among RPs 906 for merging, then merging is carried out based on sucha merge trigger obtained from the user. In another operation 904, a userat client devices 110 configures a maximum number of RPs (e.g., 5). Oncethis limit of RPs is reached, then for every new RP that is to beuploaded to cloud vendors C1-Cn, a merge operation is carried out. Forexample, a new RP6 in RPs 908 causes RPs 1 and 2 in RPs 906 to merge, asshown by RP 2,1 in RPs 908. As a result of merging only RPs andtherefore, corresponding CAT tables in those RPs, significant costsavings are achieved when compared with conventional merging of actualdata of snapshots in data blocks 210.

FIGS. 10A-10C illustrate an example implementation where cloudallocation tables of a base disk (or, parent disk) and a child disk aremerged to result in an updated cloud allocation table with changes fromboth the base and the child disks incorporated therein. FIG. 10Aillustrates CAT table of a base VHD with cloud object identifiers(similar to cloud object identifiers 610) and Boolean indicators(similar to Boolean indicators 612) for corresponding cloud objectidentifiers (similar to object identifiers 610). By way of example onlyand not by way of limitation, assuming base VHD contains Block #1, Block#2, and Block #3 (similar to data blocks 210) and assuming onlysector#42 is valid in Block #1, Block #1, Block #2, Block #3 arereplicated to ObjectID #1, ObjectID #2, ObjectID #3 (similar to cloudobject identifiers 610) in cloud vendors C1-Cn. For base VHD, CAT 102with metadata to store cloud object identifiers 610 for each valid blockfor that disk is created at VHD management system 106 of server 104, orother storage devices such as those in cloud vendors C1-Cn. CAT 102contains ObjectID #1, ObjectID #2, ObjectID #3 and corresponding Booleanindicators 612 for the base VHD disk, a parameter(“IsChangedinThisrecoveryPoint”), that indicates changes in this baseVHD, is set to a “true” (or, “T”) value for all valid entries of CAT102.

FIG. 10B illustrates CAT 102 of a child VHD. Assuming in the child VHDonly sector#1 is modified in Block #1. When the child disk is generatedin one of cloud vendors C1-Cn, each block in the child disk is aconsolidation of all its parent changes and its own changes. Therefore,Block #1 should contain both sector#1 and sector #42. For the generationof CAT 102 of a child disk, CAT 102 of the parent VHD is taken and allBoolean indicators 612 are marked as “False” (or,“IsChangedinThisrecoveryPoint” variables are changed to “F”). A cloudobject is newly created with data of Block #1 and cloud objectidentifiers 610 that are newly created are overwritten and thecorresponding Boolean indicators are changed to “True” (or,“IsChangedinThisrecoveryPoint” is set to “T”). Therefore, for each validblock in child disk CAT 102, corresponding cloud object identifiers 610are obtained and overwritten at the corresponding blocks. For example,ObjectID #1 is overwritten with ObjectId#4, and sector map 614 isupdated to contain both sector #1 and Sector #42 with correspondingBoolean indicator in Boolean indicators 612 changed to a value “true”(or, “IsChangedinThisrecoveryPoint” changed to “T”) resulting in CAT 102child disk containing ObjectID #4, ObjectID #2, and ObjectID #3 as shownin FIG. 10B.

FIG. 10C illustrates resulting CAT 102 after a merging operationperformed according to implementations of method 700-900. In FIG. 10C,only metadata in the form of CAT 102 is merged but not the actual data.Further, delete operations generally may not involve any cost and aretherefore can be freely performed by VHD management system 106. After,downloading only CAT (e.g., CAT 102) from cloud vendors C1-Cn, amodification of the CAT table of the child disk that is latest among thetwo disks (i.e., parent VHD and child VHD) that are being merged isperformed by VHD management system 106. An iteration over both the CATtables of base/parent VHD and child VHD is carried out by VHD managementsystem 106, and all the cloud objects of parent disk whose cloud objectidentifiers 610 are overwritten with child VHD's cloud objectidentifiers are deleted. That is, at each iteration if“IsChangedinThisrecoveryPoint” is “T” for both child and parent VHDs,then the object belonging to parent VHD's data blocks 210 are deleted.For the example above in FIGS. 10A and 10B,“IsChangedinThisrecoveryPoint” is “T” for only data block #1 in bothchild disk and base disk. So, object corresponding to data block #1,which is ObjectID #1, is deleted.

All “IsChangedinThisrecoveryPoint” parameters are then marked to “T”when merging of child disk to base disk is to be carried out by VHDmanagement system 106. When merging two child disks“IsChangedinThisrecoveryPoint” is changed to a value “T” only wheneither of the child disk has “T”. Subsequently, uploading the CAT tableof the child disk that is modified to cloud vendors C1-Cn is carried outby overwriting the older CAT table. Finally, the CAT table and headersof the parent disk are deleted and accordingly a “CloudVHDHeader” (e.g.,header 400) is updated by removing the corresponding entry in diskinformation fields 408 for the particular disk. Accordingly, the numberof recovery points (RPs) for the disk are decremented. For every VHDrecovered from cloud storage environment 100 (e.g., from cloud vendorsC1-Cn), the CAT table of the corresponding child disk for a particularRP or snapshot is taken and VHD is created locally from the CAT table byreading object identifiers (e.g., cloud object identifiers 610)

Cloud computing and virtualization are emerging markets in the comingdecades. Cloud computing and virtualization are changing the basicparadigm of information technology infrastructure. Variousimplementations of the invention disclosed above advantageously optimizethe way VHDs (virtual hard disks) are stored and managed at cloud. Insome implementations, such optimization can be part of Hyper-V® virtualmachine manager provided by Microsoft Corporation of Redmond, Wash.Various implementations of the invention save cost and time to recover aVHD from a given snapshot. Further, merging two VHD disks in cloudenvironments (e.g., environment 100) is fast and incurs very less cost.By way of example only and not by way of limitation, in a scenario wheresnapshots are changed and have to be updated and/or merged, at timetaken for merging=time taken to download meta data files+time taken tomanipulate meta data files+time taken to upload meta data files. Sincemetadata files (e.g., file 300 having CAT 102) are substantially smallerthan the actual data files, by performing operations (e.g., thosedisclosed in processes 700-900) on such files, fast merging of recoverypoints and snapshot data occurs. Processes 700-900 may be applied to anyvirtual disks that adhere to VHD specification, although theimplementations may be modified for other forms of virtual hard disks.

Instead of or in addition to cloud storage, various implementations ofthis invention can be used, for example, for Hadoop Distributed FileSystem (HDFS) where a file is used instead of a cloud object. In HDFS,an object identifier may be a file name. Various implementations of thisinvention can also be used for file systems where files are used ascloud objects, such that object identifiers are filenames. In someimplementations, the disclosure may be extended to store multipleobjects in same file where object identifier will include both filenameand offset.

Implementations of the invention may be made in hardware, firmware,middleware, software, or various combinations thereof. The invention mayalso be implemented as computer-readable instructions stored on atangible computer-readable storage medium which may be read and executedby one or more processors. A computer-readable storage medium mayinclude various mechanisms for storing information in a form readable bya computing device. For example, a tangible computer-readable storagemedium may include optical storage media, flash memory devices, and/orother storage mediums. Further, firmware, software, routines, orinstructions may be described in the above disclosure in terms ofspecific exemplary aspects and implementations of the invention, andperforming certain actions. However, it will be apparent that suchdescriptions are merely for convenience, and that such actions may infact result from computing devices, processors, controllers, or otherdevices executing firmware, software, routines or instructions.

Other implementations, uses, and advantages of the invention will beapparent to those skilled in the art from consideration of thespecification and practice of the invention disclosed herein. Thespecification should be considered exemplary only, and the scope of theinvention is accordingly intended to be limited only by the followingclaims.

What is claimed is:
 1. A method for managing virtual hard disks in acloud computing/storage environment, comprising: associating, using avirtual hard disk (VHD) management system of a server device, aplurality of data blocks of a virtual hard disk stored at a cloud vendorto a corresponding plurality of cloud objects; storing, using the VHDmanagement system, a plurality of cloud object identifiers associatedwith the plurality of cloud objects in a first cloud allocation table;determining, using the VHD management system, changes to one or moredata blocks in the plurality of data blocks; forming, using the VHDmanagement system, corresponding new cloud allocation tables for everydata block in the plurality of data blocks that has changed, the newcloud allocation tables having corresponding new cloud objectidentifiers; downloading, using the VHD management system, the first andthe new cloud allocation tables; merging, at the VHD management system,the first and the new cloud allocation tables to form an updated cloudallocation table; and uploading, using the VHD management system, theupdated cloud allocation table to the cloud vendor such that the updatedcloud allocation table includes information regarding the changed datablocks in the plurality of data blocks.
 2. The method of claim 1,wherein the merging comprises: detecting updated cloud objectidentifiers in the first and the new cloud allocation tables to formupdated cloud object identifiers in the updated cloud allocation table.3. The method of claim 1, wherein the determining the changes is basedupon determining which sectors of the one or more data blocks havechanged.
 4. The method of claim 3, wherein the cloud object identifiersare Boolean indicators in the first and the new cloud allocation tables.5. The method of claim 1, wherein for each data block that is changed inthe one or more data blocks, the new cloud object identifiers arecreated.
 6. The method of claim 1, wherein the plurality of data blocksstore respective snapshots of virtual machine images.
 7. The method ofclaim 1, wherein the first cloud allocation and the new cloud allocationtables are smaller in size than the data blocks such that a time takenfor the downloading is less than a time taken for downloading the datablocks.
 8. A tangible computer-readable storage medium having one ormore computer-readable instructions thereon for managing virtual harddisks in a cloud computing/storage environment, which when executed byone or more processors cause the one or more processors to: associate,using a virtual hard disk (VHD) management system of a server device, aplurality of data blocks of a virtual hard disk stored at a cloud vendorto a corresponding plurality of cloud objects; store, using the VHDmanagement system, a plurality of cloud object identifiers associatedwith the plurality of cloud objects in a first cloud allocation table;determine, using the VHD management system, changes to one or more datablocks in the plurality of data blocks; form, using the VHD managementsystem, corresponding new cloud allocation tables for every data blockin the plurality of data blocks that has changed, the new cloudallocation tables having corresponding new cloud object identifiers;download, using the VHD management system, the first and the new cloudallocation tables; merge, at the VHD management system, the first andthe new cloud allocation tables to form an updated cloud allocationtable; and upload, using the VHD management system, the updated cloudallocation table to the cloud vendor such that the updated cloudallocation table includes information regarding the changed data blocksin the plurality of data blocks.
 9. The tangible computer-readablestorage medium of claim 8, wherein the one or more processors are causedto merge by: detecting updated cloud object identifiers in the first andthe new cloud allocation tables to form updated cloud object identifiersin the updated cloud allocation table.
 10. The tangiblecomputer-readable storage medium of claim 8, wherein the one or moreprocessors are caused to determine the changes based upon determiningwhich sectors of the one or more data blocks have changed.
 11. Thetangible computer-readable storage medium of claim 10, wherein the cloudobject identifiers are Boolean indicators in the first and the new cloudallocation tables.
 12. The tangible computer-readable storage medium ofclaim 8, wherein for each data block that is changed in the one or moredata blocks, the new cloud object identifiers are created.
 13. Thetangible computer-readable storage medium of claim 8, wherein theplurality of data blocks store respective snapshots of virtual machineimages.
 14. The tangible computer-readable storage medium of claim 8,wherein the first cloud allocation and the new cloud allocation tablesare smaller in size than the data blocks such that a time taken for thedownloading is less than a time taken for downloading the data blocks.15. A system for managing virtual hard disks in a cloudcomputing/storage environment, the system comprising: one or moreprocessors configured to: associate, using a virtual hard disk (VHD)management system of a server device, a plurality of data blocks of avirtual hard disk stored at a cloud vendor to a corresponding pluralityof cloud objects; store, using the VHD management system, a plurality ofcloud object identifiers associated with the plurality of cloud objectsin a first cloud allocation table; determine, using the VHD managementsystem, changes to one or more data blocks in the plurality of datablocks; form, using the VHD management system, corresponding new cloudallocation tables for every data block in the plurality of data blocksthat has changed, the new cloud allocation tables having correspondingnew cloud object identifiers; download, using the VHD management system,the first and the new cloud allocation tables; merge, at the VHDmanagement system, the first and the new cloud allocation tables to forman updated cloud allocation table; and upload, using the VHD managementsystem, the updated cloud allocation table to the cloud vendor such thatthe updated cloud allocation table includes information regarding thechanged data blocks in the plurality of data blocks.
 16. The system ofclaim 15, wherein the one or more processors are caused to merge by:detecting updated cloud object identifiers in the first and the newcloud allocation tables to form updated cloud object identifiers in theupdated cloud allocation table.
 17. The system of claim 15, wherein theone or more processors are caused to determine the changes based upondetermining which sectors of the one or more data blocks have changed.18. The system of claim 17, wherein the cloud object identifiers areBoolean indicators in the first and the new cloud allocation tables. 19.The system of claim 15, wherein for each data block that is changed inthe one or more data blocks, the new cloud object identifiers arecreated.
 20. The system of claim 15, wherein the plurality of datablocks store respective snapshots of virtual machine images.
 21. Thesystem of claim 15, wherein the first cloud allocation and the new cloudallocation tables are smaller in size than the data blocks such that atime taken for the downloading is less than a time taken for downloadingthe data blocks.